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Abstract 

Severe acute respiratory syndrome (SARS) belongs to coronavirus, however, it is dramatically 
different from all previously known coronaviruses. A peculiar character of RNA sequence is found in 
SARS, revealing particular symmetry in its sequencing. Comparison of symmetry between SARS and other 
coronaviruses shows heuristically that SARS coronavirus might come from the avian infectious bronchitis 
virus or porcine epidemic diarrhea virus. 
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I. Introduction 

The obtained entire genome sequence of 
SARS virus shows that SARS coronavirus 
(SARS-CoV) is sufficiently different from all 
previously known three groups of coronaviruses 
[1-2]. Comparison between the predicted amino 
acid sequences for three well-defined enzymatic 
proteins and the four major structural proteins of 
SARS-CoV with those from representative 
coronaviruses illustrates that SARS-CoV forms 
a distinct group within the genus of 
coronaviruses[3]. Marra et al. also obtained a 
similar result based on the analysis of different 
SARS-CoV isolates [4]. 

Till now, the origin and evolutionary history 
of the SARS-CoV remain unclear, and all 
analyses were based on gene sequence 
homologous alignment. Tuen et al. assumed that 
the SARS-CoV was evolved from virus relative 


innocuousness or causing slight symptom to the 
human, but several mutants that changed virus 
tropism happened in some animal carrier (e. g. 
palm civets) and made this virus become deadly 
to the human [5], Because highly frequency 
homologous recombination often takes place 
when the coronaviruses were replicated [6-7]. In 
addition, Guan et al. found that the virus 
sequence is homology with SARS-CoV on wild 
animal up to 99.8% [8], Many researchers 
suggested that SARS-CoV has a different 
coronaviruses recombinant history. For example, 
Stavrinides et al. reported that SARS virus is in 
fact a mosaic of mammalian and avian-like 
viruses and the recombination between the 
parents viruses may have occurred in the 
host-determining S gene [9], Zhang et al. 
employed 7 recombination detection technique 
and conducted phylogenetic analysis, and found 
that 7 putative recombination regions exist 
between SARS and other 6 coronaviruses: 


* Xuan Xiao and Jin-Song Yao ('onlribule equally to this work 

** Corresponding author Bioinformatics Research Center, Donghua University, Shanghai 200051. China 
Email: shshaofy dhu edu.cn 




182 


X. Xiao, et al. Particular Symmetry in RNA sequence of SARS and the Origin of SARS Coronavirus 


porcine epidemic diarrhea virus (PEDV), 
transmissible gastroenteritis virus (TGEV), 
bovine coronavirus (BCoV), human coronavirus 
229E (HCoV), murine hepatitis virus (MHV), 
and avian infectious bronchitis virus (IBV) [10]. 
There were also claims that the SARS virus was 
not a host range mutant of any previously 
described coronaviruses due to its low sequence 
identity to known coronaviruses [11-12], 
Holmes et al pointed out that the phylogenetic 
patterns cited as evidence for recombination are 
more probably caused by a variation in 
substitution rate among lineages, the 
recombination can not explain the origin of 
SARS-CoV[13]. 

It is not suitable for constructing the 
phylogenetic trees using sequence alignment 
when alignment regions are characterized by 
low consistence or variable length [14-15], The 
comparability between SARS-CoV and other 
coronaviruses is low, it is necessary to use a 
method to investigate the origin of the 
SARS-CoV. In this paper, the visual method of 
particular symmetry is used to analyze all the 
full-length RNA genomes of known coronavirus 
strains and we find a new sequence 
characteristic of SARS-CoV. 

2. Particular Symmetry 

Analyzing the published 153 SARS-CoVs 
and other 24 coronavirus (all downloaded from 
National center for biotechnology information) 
with our method introduced in the following 
section, we discover a special characteristic of 
SARS-CoV. From about 3232 to 5624nt, 5703 to 
7195nt, 12128 to 14470nt, 16444 to 19231 nt, 
19728 to 21803nt in the SARS-CoV genome 
sequences near 5-terminal, the number of 
Adenine (A) is almost equal to the number of 
thymine (T) in the above five sections, and the A 
are mostly mastered in the 5’-terminal of the 
segment, T are mostly in the 3’-terminal region. 
Because A and T are complementary pair in 
double-helix structure, this kind of characteristic 
is named as particular symmetry. For all the 
other coronaviruses, this characteristic doesn’t 
exist in the same regions and the number of T is 
obviously larger than that of A. Only in the 
PEDV and IBV genome sequences, there exist 


the similar distribution of A and T in the region 
of 3232 to 5624nt. The ratio of T/A in the entire 
SARS-CoV genome sequence is also close to 
that in the PEDV and IBV. 


3. Method 

DNA sequencing is a procedure of nature 
selection, J.H. He [16] first endows a quaternary 
digit (0, 1, 2, 3) for adenine(A), cytosine(C), 
guanine(G), and thymine(T), respectively, see 
Tab. 1. J.H. He [16] also endows A, C, T, and G 
with one of the following numbers: 00, 01, 10, 
and 1 l(See Tab.2). For example 

1101010010101011010101010111101 

might imply an DNA sequence for a special 
genome. 


Tab. 1 Possible quaternary values for 
A,C,T, and G.[16] 
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The number of A-T partnership can decide the 
degree of stability of nucleotide chain; it can 
also be used to distinguish each species, because 
the genome sequence varies with the species 
evolvement. The more closely phylogenetic 
relationship between two species, the more 
similar content proportion of AT in genome 
sequences they should be. 

In this paper, the visualization method of AT 
content is presented. According to this method, a 
new sequence characteristic of SARS-CoV is 
found. Our method circumambulates the 
difficulty of sequence alignment, and avoids any 
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bias that may be associated with particular 
genomic regions. A nucleotide sequence is coded 
as follows: 

A = - 1, C = 0, G = 0, T = 1(1) 

Through the above encoding procedure, a 
gene sequence is transformed to a serial of 
digital signals. For example, the sequence of 
“AATGCTGG” can be coded as a discrete 
digital sequence P ="-1,-1,1.0,0,1,0,0". 

The second step is to use the function of 

sum: 

S(p,j) = YjPi J = ( 2 ) 

j=i 

where p t is the value for the i' h sequence, 

n is the length of nucleotide chain. 

Fig. 1 shows the relationship between S and 
n for several kinds of coronavirus curves. It is 
obvious that the curve of SARS-CoV is different 
from those of other groups of coronavirus most 
distinctively. 

4. Discussion and Conclusion 

Utilizing the visualization method of AT 
content mentioned above, we analyze the 
different parts of these full-length sequence 
curves in all known 153 SARS-CoV and other 
24 coronavirus isolates obtained from the 
Genbank. Five segments in SARS-CoV curve 
are relatively flat, other coronavirus curves are 
almost upwards all the time. According to the 
figure 1, the SARS-CoV particular symmetry 
was thus obtained. From about 3232 to 5624nt, 
5703 to 7195nt. 12128 to 14470nt. 16444 to 
1923 1 nt, 19728 to 21803nt in the SARS-CoV 
sequence near 5-terminal, the number of A is 
almost equal to the number of T, the average 
ratio of 77A is 1.002, 1.005, 1.007, 1.004, 1.001 
respectively. In figure 1, these five segments of 
SARS-CoV curves are the upward concave 
shape. This indicates that A is rich in the 
5‘terminal part of the segment because the curve 
is downward in the front part, and T is rich in 
latter part because curve is upward. But the 
number of T is greater than that of A in the same 


5 segments of other coronaviruses. The ratio of 
T/A is mostly in 1.2 nearby, the average is 1.256, 
1.300,1.198,1.194,1.221 respectively, other 
coronaviruses have not the character of 
SARS-CoV. The statistical average of particular 
symmetry of all coronaviruses in five segments 
of sequences are showed in Tab.3. It should be 
emphasized that there also exist A=T in other 
areas of the SARS-CoV curve, but these regional 
length do not exceed 1200nt or do not satisfy the 
condition that A mostly exist in front part and T 
mostly in rear part. 

Among coronavirus, the PEDV and IBV 
have the ratio of T/A 1.026, 0.994 respectively 
in the first interval from 3232nt to 5624nt, it is 
obviously that these data are closely to the 
SARS-CoV ratio of T/A in the same region. 
PEDV also have the closely ratio of T/A 1.096 in 
the fourth interval between 16444 and 19231 nt. 
From 2408nt to 5794nt in the IBV sequence and 
3223 to 6160nt in the PEDV sequence near 
5-terminal, the number of A is almost equal to 
the number of T, and A almost exist in the 
former part, T in latter part. The IBV and PEDV 
have the similar particular symmetry. Other 
coronaviruses have not the above character; it is 
suggested that SARS-CoV is closer to IBV and 
PEDV. This result is consistent with other 
people’s reports. For example, Stavrinides and 
Guttman[9] used Bayesian, neighbor-joining, 
and split decomposition phylogenetic technique 
to the SARS replicase, surface spike, matrix and 
nucleocapsid proteins, and revealed the origin 
of SARS. The analyses support an avian-like 
origin for the matrix and nucleocapsid proteins, 
and a mammalian-avian mosaic origin for the 
host-determining spike protein. Qi et al. 
compared the entire sequence of 12 SARS-CoV 
with 12 other coronavirused based on the 
method of function of degree of disagreement 
and suggested that SARS-CoV was closely 
relation with the group 1 of coronavirus[17]. 

Symmetry is an essential character in DNA. 
In a double-helix DNA strands, there are A=T 
and G=C, and also have the phenomena of A«T 
and G«C inside single strand [18,19], Seuoka 
suggested a hypothesis to explain the 
symmetrical relation: if select pressure and 
natural mutation of the double strands were 
equal, there would appear the phenomena of 
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A«T and G«C in single strand after a long 
time evolvement [21]. It is showed that the 
appearing symmetry is the direction of 
evolvement, and the symmetry phenomenon of 
A«T and G«C exists in the most organism 
have finished sequence, and the longer the 
sequence is, the higher the precision is. The 
ratios of A/T and C/G fluctuate between 0.999 
and 1.001 in human each chromosome. 

It is clear that the SARS-Coves owns more 
particular symmetry of A « T than other 
coronaviruses from figure 1, and the ratio of T/A 
in SARS-CoV entire genome sequence also 


close to 1 than that of other coronaviruses, above 
all shows that SARS-CoV evolve from other 
coronaviruses, and it is most possible that 
SARS-CoV is closely related to the PEDV and 
IBV. 

We should emphasize that symmetry is the 
natural character in particle world[20], and in 
biology [22,23,24] as well. We will further study 
the particular symmetry in SARS in the future, 
emphasizing on more mysterious characters of 
SARS. 



Figure 1. The coronavirus’s curves based on visualization method of AT content. The curve 1 is the human 
coronavirus NL63, 2 the bovines coronavirus, 3 human coronavirus OC43, 4 the porcine epidemic 
diarrhea virus, 5 human coronavirus 229E, 6 murine hepatitis virus, 7 avian infectious bronchitis virus, 8 
transmissible gastroenteritis virus, 9 SARS coronavirus TW1. 


Tab.3. Statistical average data of all coronaviruses particular symmetry in five segments of 
sequences. 
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